FigureSeer: Parsing Result-Figures in Research Papers
نویسندگان
چکیده
‘Which are the pedestrian detectors that yield a precision above 95% at 25% recall?’ Answering such a complex query involves identifying and analyzing the results reported in figures within several research papers. Despite the availability of excellent academic search engines, retrieving such information poses a cumbersome challenge today as these systems have primarily focused on understanding the text content of scholarly documents. In this paper, we introduce FigureSeer, an endto-end framework for parsing result-figures, that enables powerful search and retrieval of results in research papers. Our proposed approach automatically localizes figures from research papers, classifies them, and analyses the content of the result-figures. The key challenge in analyzing the figure content is the extraction of the plotted data and its association with the legend entries. We address this challenge by formulating a novel graph-based reasoning approach using a CNN-based similarity metric. We present a thorough evaluation on a real-word annotated dataset to demonstrate the efficacy of our approach. 1 Computer Vision for Scholarly Big Data Academic research is flourishing at an unprecedented pace. There are already over 100 million papers on the web [1] and many thousands more are being added 2 Noah Siegel, Zachary Horvitz, Roie Levin, Santosh Divvala, and Ali Farhadi every month [2]. It is a Sisyphean ordeal for any single human to cut through this information overload and be abreast of the details of all the important results across all relevant datasets within any given area of research. While academicsearch engines like Google Scholar, CiteSeer, etc., are helping us discover relevant information with more ease, these systems are inherently limited by the fact that their data mining and indexing is restricted to the text content of the papers. Research papers often use figures for reporting quantitative results and analysis, as figures provide an easy means for communicating the key experimental observations [3]. In many cases, the crucial inferences from the figures are often not explicitly stated in text (as humans can easily deduce them visually) [4]. Therefore failing to parse the figure content poses a fundamental limitation towards discovering important citations and references. This paper presents FigureSeer, a fully-automated framework for unveiling the untapped wealth of figure content in scholarly articles (see figure 1). Why is figure parsing hard? Given the impressive advances in the analysis of natural scene images witnessed over the past years, one may speculate that parsing scholarly figures is a trivial endeavor. While it is true that scholarly figures are more structured than images of our natural world, inspecting the actual figure data exposes a plethora of complex vision challenges: Strict requirements: Scholarly figures expect exceptional high-levels of parsing accuracy unlike typical natural image parsing tasks. For example, in figure 2(c), even a small error in parsing the figure plot data changes the ordering of the results, thereby leading to incorrect inferences. High variation: The structure and formatting of scholarly figures varies greatly across different papers. Despite much research in engendering common design principles, there does not seem to be a consensus reached yet [5, 6]. Therefore different design conventions are employed by authors in generating the figures, thereby resulting in wide variations (see figure 2). Heavy clutter and deformation: Even in the best case scenario, where figures with a common design convention are presented, there still remains the difficulty of identifying and extracting the plot data amidst heavy clutter, deformation and occlusion within the plot area. For example, in figure 2(d), given just the legend symbol template for ‘h3 LM-HOP availability’ method, extracting its plot data is non-trivial due to the heavy clutter and deformation (also see figure. 4). While color is an extremely valuable cue for discriminating the plot data, it may not always be available as many figures often reuse similar colors (see figure 2), and many older papers (even some new ones [7, 8]) are published in grayscale. Moreover, unlike natural image recognition tasks where desired amount of labeled training data can be obtained to train models per category, figure parsing has the additional challenge where only a single exemplar (i.e., the legend symbol) is available for model learning. All these challenges have discouraged contemporary document retrieval systems from harvesting figure content other than simple meta-data like caption text. Overview: The primary focus of our work is to parse result-figures within research papers to help improve search and retrieval of relevant information in the academic domain. The input to our parsing system is a research paper in .pdf FigureSeer: Parsing Result-Figures in Research Papers 3 Fig. 2: There is high variation in the formatting of figures: some figures position the legend within the plot area, while others place it outside. Within the legend, some figures have symbols on the right of the text, while others on the left. The presence of heavy occlusions and deformations also poses a challenge. format and the output is a structured representation of all the results-figures within it. The representation includes a detailed parse of each figure in terms of its axes, legends, and their corresponding individual plot data. We focus our attention on result-figures as they summarize the key experimental analysis within a paper. More specifically, within our corpus of papers, we found 2D-graphical plots plotting continuous data (such as precision-recall, ROC curves, etc.) to be most popular and frequent. In this paper, we present a novel end-to-end framework that automatically localizes all figures from a research paper, classifies them, and extracts the content of the result-figures. Our proposed approach can localize a variety of figures including those containing multiple sub-figures, and also classify them with great success by leveraging deep neural nets. To address the challenges in parsing the figure content, we present a novel graph-based reasoning approach using convolutional neural network (CNN) based similarity functions. Our approach is attractive as it not only handles the problems with clutter and deformations, but is also robust to the variations in the figure design. As part of this work, we also introduce thorough evaluation metrics, along with a fully-annotated real-world dataset to demonstrate the efficacy of our parsing approach. Finally, to demonstrate the potential unleashed by our approach, we present a query-answering system that allows users to query figures and retrieve important information. In summary, our key contributions are: (i) We introduce and study the problem of scholarly figure parsing. (ii) We present a novel end-to-end framework that automatically localizes figures, classifies them, and analyzes their content. (iii) We present a thorough evaluation on a real-word dataset to demonstrate the efficacy of our approach. (iv) We demonstrate the utility of our parsing approach by presenting a query-answering system that enables powerful search and retrieval of results in research papers using rich semantic queries. (v) Finally, we release a fully-annotated dataset, along with a real-world end-to-end system for spurring further research. We hope our work will help kick-start the challenging domain of vision for scholarly big data.
منابع مشابه
An improved joint model: POS tagging and dependency parsing
Dependency parsing is a way of syntactic parsing and a natural language that automatically analyzes the dependency structure of sentences, and the input for each sentence creates a dependency graph. Part-Of-Speech (POS) tagging is a prerequisite for dependency parsing. Generally, dependency parsers do the POS tagging task along with dependency parsing in a pipeline mode. Unfortunately, in pipel...
متن کاملبررسی مقایسهای تأثیر برچسبزنی مقولات دستوری بر تجزیه در پردازش خودکار زبان فارسی
In this paper, the role of Part-of-Speech (POS) tagging for parsing in automatic processing of the Persian language is studied. To this end, the impact of the quality of POS tagging as well as the impact of the quantity of information available in the POS tags on parsing are studied. To reach the goals, three parsing scenarios are proposed and compared. In the first scenario, the parser assigns...
متن کاملA Comparative Analysis of Metadiscourse Markers in the Result and Discussion Sections of Literature and Engineering Research Papers
This study compares metadiscourse markers in result and discussion sections of literature and engineering research papers. To this end, 40 research articles (20 literature and 20 engineering) are selected from two major international journals. Based on Hyland’s (2005) model of metadiscourse, the articles are codified in terms of frequency, percentage, and density of interactive and interactiona...
متن کاملUsing modified incremental chart parsing to ascribe intentions to animated geometric figures.
People spontaneously ascribe intentions on the basis of observed behavior, and research shows that they do this even with simple geometric figures moving in a plane. The latter fact suggests that 2-D animations isolate critical information--object movement--that people use to infer the possible intentions (if any) underlying observed behavior. This article describes an approach to using motion ...
متن کاملRelative Clause Ambiguity Resolution in L1 and L2: Are Processing Strategies Transferred?
This study aims at investigating whether Persian native speakers highly advanced in English as a second language (L2ers) can switch to optimal processing strategies in the languages they know and whether working memory capacity (WMC) plays a role in this respect. To this end, using a self-paced reading task, we examined the processing strategies 62 Persian speaking proficient L2ers used to read...
متن کامل